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Abstract: Consider a model parameterized by a scalar parameter of interest 
and a nuisance parameter vector. Inference about the parameter of interest may 
be based on the signed root of the likelihood ratio statistic R. The standard 
normal approximation to the conditional distribution of R typically has error 
of order 0(n -1 / 2 ), where n is the sample size. There are several modifications 
for R, which reduce the order of error in the approximations. In this paper, 
we mainly investigate Barndorff-Nielsen's modified directed likelihood ratio 
statistic, Severini's empirical adjustment, and DiCiccio and Martin's two mod- 
ifications, involving the Bayesian approach and the conditional likelihood ratio 
statistic. For each modification, two formats were employed to approximate the 
conditional cumulative distribution function; these are Barndorff-Nielson for- 
mats and the Lugannani and Rice formats. All approximations were applied 
to inference on the ratio of means for two independent exponential random 
variables. We constructed one and two-sided hypotheses tests and used the 
actual sizes of the tests as the measurements of accuracy to compare those 
approximations. 



1. Introduction 

When analyzing data arising from a model with a single unknown parameter, statis- 
ticians frequently build tests of a simple null hypothesis around the likelihood ratio 
statistic, since the signed square root of the likelihood ratio statistic, R, often has 
a distribution that is well-approximated by a standard normal distribution under 
the null hypothesis. In the presence of nuisance parameters, the statistic R depends 
on the nuisance parameters. Practitioners often replace the nuisance parameters 
in the likelihood function by their maximum likelihood estimates and examine the 
resulting profile likelihood as a function of the parameter of interest. Denote by n 
the sample size. The standard normal approximation to the conditional distribu- 
tion of R typically has error of order <3(n -1 / 2 ), and R can be used to construct 
approximate confidence limits for the parameter of interest having coverage error 
of that order. In large sample settings, this approximation works well. However, in 
small sample situations, with 10 or 15 observations, the standard normal approxi- 
mation may not be adequate. Hence, various authors developed modifications for R 

*The authors were supported by Grant DMS 0505499 from the National Science Foundation. 
The authors are grateful for the assistance of one of the volume editors, Bill Strawderman, and 
an anonymous referee. 

1 Department of Statistics and Biostatistics, Hill Center, Bush Campus, Rutgers, The 
State University of New Jersey, 110 Frelinghuysen Rd., Piscataway, NJ 08854 USA, e-mail: 
j anezhOstat . rutgers . edu , kolassaOstat . rutgers . edu 

AMS 2000 subject classifications: 62E60, 41A58. 

Keywords and phrases: modified signed likelihood ratio statistic, saddlepoint approximation, 
conditional cumulative distribution. 



250 



Saddlepoint conditional distribution function approximations 



251 



using saddlepoint approximation techniques. These modifications reduce the order 
of error in the standard normal approximation to the conditional distribution of R. 

Barndorff-Nielsen Q first proposed the modified directed signed root of the like- 
lihood ratio statistic R*. This statistic will be reviewed in the next section. The 
relative error in the standard normal approximation to the conditional distribu- 
tion of R* is of order 0(>- 3 / 2 ). Barndorff-Nielsen dHH also considered using a 
variation on this approximation, of the same form as the univariate expansion of 
Lugannani and Rice 10]. The drawback of these approximations is that the calcula- 
tion their calculation requires the calculation of an exact or approximate ancillary, 
and in some situations it is hard or impossible to construct this ancillary. For the 
other approximations that we will study in the following, no such ancillary needs 
to be specified, and hence the approximations are easier to apply in practice. 

beverim proposed an approximation R* to Barndorff-Nielsen's R* based on 
empirical covariances. Under some assumptions and model regularity properties, 
R* is distributed according to a standard normal distribution, with error C^n" 1 ), 
conditionally on the observed value of an ancillary statistic A. However, the con- 
struction of this R* does not require the specification of A. 

DiCiccio and Martin Q proposed an alternative quantity to R*, denoted by 
R + , that is available without specification of A. The derivation of R + involves the 
Bayesian approach to constructing confidence limits considered by Welch and Peers 
[l5| and Peers 11 1 . In the presence of nuisance parameters, Peers [ll[ chose a prior 
density for the parameters to satisfy a partial differential equation. With this prior, 
the standard normal approximation to the conditional distribution of R + has error 
of order 0(n _1 ). If the parameter of interest and the nuisance parameter vector 
are orthogonal, solving the partial differential equation is relatively easier. In some 
cases that the parameters are not orthogonal, solving that equation numerically is 
problematic. Parameter orthogonality will be reviewed in the following section. 

For a parameter of interest that is orthogonal to the nuisance parameter vector, 
Cox and Reid [6( defined the signed root of the conditional likelihood ratio statistic 
R. The standard normal approximation to the distribution of R has error of order 
0{n~ 1 / 2 ). DiCiccio and Martin Q defined R + similar as the R + mentioned above. 
The standard normal approximation to the conditional distribution of R + has error 
of order 0(rt _1 ). The use of R and its modifications is often effective in situations 
where there are many nuisance parameters. In such cases, the use of R and its 
modified versions can produce unsatisfactory results; see DiCiccio, Field and Fraser 
for examples. 

The above variants on R have never been systematically compared to each other 
as a group. This paper provides an accuracy comparison among the modifications 
stated above. Each of these approximations are used to generate an approximate 
one-sided p- value by approximating P[R > r], for r the observed value of R. Approx- 
imate two-sided p- values are calculated by approximating 2min(P[i? > r],P[J? < 
r]). One and two-sided hypotheses tests of size a may be constructed by rejecting 
the null hypothesis when the p-value is less than a. Both the Barndorff-Nielson 
format approximation 

(1) ^{R + RrHogiU/R)} 
and the Lugannani and Rice format approximation 

(2) $(R) + ^(R- 1 - U- 1 ) 

were considered in this paper, where the variable U may vary for different modifica- 
tions. We examined as an example the ratio of means of independent exponentials. 
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We calculated via simulation the size of tests constructed as above, and then com- 
pared the results among different approximations. 

2. Methodology 

We first review several statistics whose marginal distributions are very close to 
standard normal. Consider continuous variables X\, . . . ,X n having joint density 
function that depends on an unknown parameter uj = (u>i, . . . ,u>d)- Suppose that 
w = {tpt x)j where ip = u>x is a scalar parameter of interest and x = ( w 2, ■ ■ • , ^>d) is 
a nuisance parameter vector. Let Cj = (ip>x) be the maximum likelihood estimator 
of u;, and for fixed if), let Xip be the constrained maximum likelihood estimator of 
X- The signed root of the likelihood ratio statistic is R = sgn(?/> — ipo){2(l(u>) — 
l(ipo, Xo))} 1 ^ 2 ' where Xo wm ^e shorthand for Xip an d K u ) ls the log-likelihood 
function for u). The standard normal approximation to the distribution of R typ- 
ically has error of order 0(n -1 / 2 ), and R can be used to construct approximate 
confidence limits for if) having coverage error of that order. 

The earliest general conditional saddlcpoint tail probability approximation was 
provided by Skovgaard (l3j , who applied double saddlepoint techniques to the prob- 
lem of approximating tail probabilities for conditional distributions when the data 
arise from a full exponential family. In this case the double saddlepoint distribution 
function approximation can be expressed in terms of the quantities in the joint 
density function. Skovgaard's double saddlepoint approximation to the conditional 
distribution function is of form with U a Wald statistic. In this paper, we 
consider only models more complicated than canonical exponential families, and so 
won't apply this approximation. 



2.1. Barndorff- Niels en's modification 

The modified signed root of the likelihood ratio statistic R* was first proposed by 
Barndorff-Nielsen Q and given by 

R* =R + R- 1 \og{U/R) 1 

where 

\j xx (u^)\i\j(u>)\i 

and Jxxi^i') ~ ^'xxt^Oiio) an< ^ ~ ~ ^uiu»(w), with (^(w) the matrix of 

second-order partial derivatives of Z(a;;d>, A) taken with respect to oj and I xx (uj) 
the submatrix of l uu (it}) corresponding to X- Here U represents an approximate 
conditional score statistic, which, in the multivariate normal case would exactly 
coincide with R. Outside the multivariate normal case, it measures the difference 
between R and U is a measure of departure from normality. The quantity l-u,(u>) 
is the d x 1 vector of partial derivatives of Z(u>;a7,A) taken with respect to u>, 
and I x -^(lj) is a d x (d — 1) matrix of mixed second-order partial derivatives of 
A) taken with respect to x an d The sign of U is the same as that 
of R and the resulting U is of the form U = R + O p (n~ 1 / 2 ). The relative error 
in the standard normal approximation to the conditional distribution of R* is of 
order 0(n~ 3 / 2 ). The conditioning is on an exact or approximate ancillary statistic 
A. The variable U is parameterization invariant and does not depend on %. 
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The value of tpo satisfying $(i?*) = a is an approximate upper 1 — a confidence 
limit which has relative coverage error of order 0(n~ 3 / 2 ) both conditionally and 
unconditionally. Barndorff-Nielsen [H, 0, H| also considered using the alternative to 
provided by the Lugannani and Rice format approximation ([2]). 

Consider the exponential family model for a random vector T whose density 
evaluated at t is /r(t; 9) = cxp(0 T t — Ht(0) — G(t)). The random vector T is the 
sufficient statistic and set t(9) = Eg[T]. In the presence of nuisance parameters, 
the calculation of U requires the specification of the ancillary A. Barndorff-Nielsen 
[H suggested an approximate ancillary statistic for use in conditional inference. 
Kolassa [jj, in Chapter 8.4, presented this approximate ancillary A as 

B(V-)(T-t(V-,x)) t , 

with x held fixed, and 

B(if>) = [(aT/^)- L E(aT/^)- LT ]-*(er/^)- L . 

Suppose that 9 is scalar. Let 1(9; 9, a) = 1(9; 8, a)/n. Then 

(4) Fg |A (0|a;0) = [*{y/n&) + - l/i]/VH ][1 + OpirT 1 )], 

with z = [l' ,1 (9;9,a.) — f ,1 (9; 0, a)]/y j(6>), and the superscripts ;1 on [ ;1 represent 

differentiation of the likelihood with respect to 0, after reexpressing t in terms of 
9 and a. Here a is the observed value of A; Fq^ a (9\sl; 9) is the conditional cumula- 
tive distribution function and $(•) is the standard normal cumulative distribution 
function. 

In the computation of Barndorff-Nielsen's R*, the calculation of U requires the 
ancillary A to be specified, which may present difficulties in practice. In the fol- 
lowing, we will introduce several modifications that do not require the specification 
of A. 

2.2. An empirical adjustment 

Severini [l2| proposed approximation R* to Barndorff-Nielsen's R* based on em- 
pirical covariances. Recalling the formula of C/ (J3J), the key step is to approximate 
l-x^faip) and L t &(u>) — l&(u>,p). 

Let l^(u>) denote the log-likelihood function based on observation j alone. De- 
note 

Q(u;u ) =X>->( w )iO->( Wo f , I(u,;u; ) = Y,l ( J ) ni { jH"v) T , 

and i — I(Cj;Cj). The quantity ujq is any point in the parameter space. Then 
l;u>(&) — l;&(u) and Z u;( i(ci;) may be approximated by 

hati- t &(u) - l,&(u>) = {Q(&; u>) - Q(a>; u>)}i(w) _1 j 

and 

L;«(<*>) = I(uj;u)i(uy 1 j, 

where j = j(u>) = -l uw (u>). 
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Denote by U the approximation to the statistic U based on the above quantities, 
and then denote 

R* = R + R-HogiU/R). 

The quantity R* can be used in approximation ([1]). This represents a correction 
similar to that of ([3]), with expectations of quantities replaced by sample means. 
Under some assumptions plus model regularity properties, R* is distributed accord- 
ing to a standard normal distribution, with error 0(n _1 ), conditionally on a, the 
observed value of the ancillary A. However, the construction of R* does not require 
the specification of A. Again, the alternative approximation J2J) is also available as 



2.3. DiCiccio and Martin's modification 

DiCiccio and Martin [8[ proposed an alternative variable to U, denoted by T, which 
is available without specification of the ancillary A. The modification for approxi- 
mation (JTJ) is 

(5) R + = R + R- 1 log(T/R), 

where T is defined in . As with ([3]) , the final term in R + represents the departure 
from normality; unlike |3]), this measure represents the departure of the posterior 
for tp from normality, and involves the prior distribution. Once again, one might 
use the alternative probability approximation ^ with T substituting the place 
of U. The replacement of T avoids the necessity of specifying A in calculating U 
and hence simplifies the calculations. The derivation of T involves the Bayesian 
approach to constructing confidence limits considered by Welch and Peers [1 51 ] and 
Peers [11]. When oj = ip, that is, when the entire parameter is scalar and there 
are no nuisance parameters, Welch and Peers [HI showed that the appropriate 
choice is it H cx {i(u)} 1/2 , where = E{- d 2 i(w)/ du 2 }. In the presence of 
nuisance parameters, Peers [Tljj showed that w(u>) must be chosen to satisfy the 
partial differential equation 

(6) E^(^ 11 )- 1/2 ^jaog 7 r)+E^j{^^ 11 )- 1/2 } = o, 

3=1 3=1 

where ijk(<+>) — F,{—d 2 l(uj)/duj : >duj k } and (P ) is the d X d matrix inverse of (ijk)- 
The variable T is defined as 

(7\ T-l (,h * \ \ - l xx(^o^Xo)\ 1/2 A^) 

Here l^{u>) = dl{uj) / dif) , and 7r(u;) is a proper prior density for uj = (ip,x) which 
satisfies the equation ([6]). Then the resulting approximation ([2]) is 

P(V> > Vo|A) = $(i?) + (R- 1 - T- l )<p{R) + 0(n- 3/2 ), 

where T = U + O p (n _1 ), and thus the approximation ([T]) to the conditional distri- 
bution based on R + log(T/i?) has error of order 0(n^ 1 ). To error of the order 
O p (n~ 1 ), T is parameterization invariant under transformations u) i— > {ip, t(u>)}. 

Parameter orthogonality may make a big difference in solving the partial differ- 
ential equation Orthogonality is defined with respect to the expected Fisher 
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information matrix. We define Q\ to be orthogonal to 62 if the elements of the 
information matrix satisfy 

for s = 1, . . . ,px, t = pi + 1, . . . ,pi + P2j where = (<?i, #2); #i and 02 are of 
length pi and ^2 respectively. If equation §8§ is to hold for all 6 in the parameter 
space, then the parameterization is sometimes called globally orthogonal. If ||SJ) 
holds at only one parameter value 6q, then the vectors B\ and 62 are said to be 
locally orthogonal at 6q. The most direct statistical interpretation of ([5]) is that the 
relevant components of the statistic are uncorrelated. 

The definition of orthogonality can be extended to more than two sets of pa- 
rameters, and in particular 9 is totally orthogonal if the information matrix is 
diagonal. In general, it is not possible to have total parameter orthogonality at all 
parameter values, but it is possible to obtain orthogonality of a scalar parameter 
of interest ip to a set of nuisance parameters. If the parameter of interest and the 
nuisance parameter vector are orthogonal, solving the partial differential equation 
© is relatively easier. The equation © reduces to 

(9) ( l ^)- 1 /^(log^) + A ( ^ ) -i/2^ 0! 

whose solutions are of the form n(ip,x) oc {i^(ip, x)} 1 ^ 2 .9(x) (Tibshirani 
where g(x) is arbitrary and the suggestive notation i^^^tp, x) is used in place 
of iii(ip,x)- I n some cases in which the parameters are not orthogonal, solving 
equation ([6]) numerically is problematic. 



2-4- Conditional likelihood ratio statistic and its modification 

For ip and x orthogonal, Cox and Reid defined the conditional likelihood ratio 
statistic for testing ip = ipo as W = 2{l(ip) — l(ipo)}, where 

l(ip) = l(ip,x<p) - - log I - l xx (ip,Xi,)\ 

and ip is the point at which the function l(ip) is maximized. The signed root of 

the conditional likelihood ratio statistic is R = sgn(?/5 — ipo)W 1 ^ 2 , and the standard 
normal approximation to the distribution of R has error of order 0(rt -1 / 2 ). Let 

R + = R + R \og(T/R). One may use approximations |T]) and ([2|), say, <I>(i? + ) or 

+ - T" 1 ), where 

T = /( l )(^o){-r (2) w}- 1/2 4^ I 

7T(V0, AO J 

and K 3 ') = d J l(ip)/ d ip^ j — 1,2. Those approximations have errors of order 
0(n- x ). 

The use of R and its modifications is often effective in situations where there are 
many nuisance parameters. In such cases, the use of R and its modified versions 
can produce unsatisfactory results; see DiCiccio, Field and Fraser [7( for examples. 
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3. Example: Exponential samples with orthogonal interest and 
nuisance parameters 

Let X and Y be exponential random variables with means fi and v respectively; the 
ratio of the means v/a is the parameter of interest. The parameter transformation 

/i — > -j= , v — ► \\p§ \ makes the two new parameters tp and A orthogonal. Then 



X and Y have expectations Xip~^ and AV> 5 , respectively. 

Suppose we have n independent replications of (X,Y). Denote oj — (i/j,X). We 
can obtain the log-likelihood function as 



Each of the approximations in section 2 may be used to generate an approxi- 
mate one-sided p- value by approximating P[R > r], for r the observed value of R. 
Approximate two-sided p- values may be calculated by approximating 2min(P[i? > 
r], P[R < r]). One and two-sided hypotheses tests of size a may be constructed by 
rejecting the null hypothesis when the p-value is less than a. Both the Barndorff- 
Nielson format approximation (fTJ) and the Lugannani and Rice format approxima- 
tion ^ were considered. We calculated via simulation the size of tests constructed 
as above, and compared the results among different approximations. 

Some of the approximations in section 2 require specific algebraic calculations. 
We present the related calculations below. Other applications are generic, and no 
specific algebraic calculations are needed. 



3.1. Some algebraic calculations 



BarndorfF-Nielsen's modification 

The expectations of the sufficient statistics T = (X, Y) in the new parameteri- 
zation are T(ip,X) = { A/^/0, Ay 7 ?/?} , and 

dT(<A,A)W = {-A/(2V>i),A/(2^)}. 

A vector perpendicular to this is (dT(ip, X)/ dip) 1 - = {^>, 1} . The variance of the 
sample mean vector is 

1 (X 2 /ip 
X 2 ip 



n 

In our case, B(ip) = y/n {^/(v^A), f /(Aa/2?)} , and 

A = V2^(VXY/X - 1 
Using Barndorff-Nielsen's formula 

\a+ V2n\(ip + tp) 



l(ip;ip,a) 



2 log A, 



2nipip 



and 



u> = sign(> - (a + V2n)(ft -ft) /(^ 1/4 n 1/2 ). 
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The negative of the second derivative of the log likelihood is 



j(ip)= (a + V2n) (3ip- V0/(4y 2™/^ 5 ), 
and the derivative of l(ip;ip,a) with respect to ip is 



{a + V2n) (ip-ip)/(2\j2n^ 3 ). 
Then the quantity z contributing to the tail probability approximation is z 



a + V2n\ (ip - ^)/(2\/2nV>V>) 



DiCiccio and Martin's modification 

Based on the above, the information matrix is 



t(ui) = E[-/"(w)] 



l/2^ 2 
2/\< 



The maximum likelihood estimators are tp = Y/X and A = (ipX + F)/(2-0 1 / 2 ) = 
Vx Y . For fixed ip, let be the constrained maximum likelihood of A. Here, 
A^ = {^X + Y)/{2xjj 1 / 2 ). If V = Vo = 1, then A v , = A = (X+F)/2. 

In this case, the parameters are orthogonal. Using the simplified partial differ- 
ential equation ([9]), we chose g{\) — 1, and hence n(ip, A) = \/n/(y/2 ip). 

In addition to use the prior solved from equation ([5]), we also studied the outcome 
from a uniform prior, that is to say, the prior with a constant density, which is 
obviously not a solution to equation ©. 



3.2. Simulation results 

Simulation procedure 

For sample size n = 10, 

(1) Generate 10 draws from the pair of {X, Y}, where X and Y both follow stan- 
dard exponential distribution; 

(2) Calculate one and two-sided p-values for each approximation; 

(3) Compare the p-values to the a level, say 0.05; denote by q the number of miss 
coverages; if the p- value is less than 0.05, then q = q + 1; 

(4) Repeat steps (l)-(3) for 10,000 times and report the final value of q; let q* — 
(q/10000) * 100, the Type I error probability in percentage. 

(5) Repeat steps (l)-(4) for 100 times and report the average of q* as the measure- 
ment of the accuracy for the modifications. 

Approximations {1} and ([2]) have a removable singularity at R = 0. Consequently, 
these and similar formulae require care when evaluating near R = 0. Specifically, we 
found (fT]) and ([2|) to exhibit adequate numerical stability as long as \R\ > 10~ 4 . Out 
of 1,000,000 simulated data sets, 60 presented R (or a modification of R) closer to 
zero. In these cases, for all but the most extreme conditioning events, the resulting 
conditional p- value is large enough as to not imply rejection of the null hypothesis, 
and so these simulated data sets were treated as not implying rejection of the null 
hypothesis. 
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Table 1 
Type I error probability (BN) 





One-sided 


Two-sided 


jrvJJJJl KJJ^. 1 11 lei L 1UJ 1 


rl Vcl dfcife: 


rl Vcl < 1 v ' 


*(Jl) 


5.241 


5.168 




4.807 


4.575 


^TJ + TTMogCrj/i?)) 


5.046 


4.760 


*(B + il- 1 log(f)'/fl)) 


5.018 


4.882 


<&(R + -R -1 log(T/R)) 


4.615 


4.312 




11.017 


6.828 


*CR + fl _1 log(T/R)) 


4.883 


4.411 


$(7? + 7T 1 log (T„/R)) 


11.723 


7.249 




Table 2 




Type 7 


error probability (LR) 





Approximat 


ion 




One-sided 
Average 


Two-sided 
Average 


0(B) 






5.241 


5.168 








4.807 


4.575 


0(ii) + (p{R)(R^ 1 


- tr 




5.046 


4.760 


$(/?) + </>(#) (iT 1 


- r>- 


- 1 ) 


5.017 


4.881 


0(-R) + 0(-R)(-R _1 


- T- 




4.613 


4.308 






*) 


11.274 


6.943 


0(R) + 0(iJ)("R _1 


-T" 


x ) 


4.878 


4.403 


0(7?) + 0(iJ)(H _1 


— T~ 


x ) 


12.190 


7.510 



Results 

Tables 1 and 2 below report the average of the Type I error probabilities (in 
percentage) of the 100 rounds simulation. The quantities T u and T u are assumed 
with uniform prior densities. 

From the tables, we can see that for both the Barndorff-Nielsen format approxi- 
mation (BN) and the Lugannani and Rice format approximation (LR), the empiri- 
cal adjustment has best performance. Barndorff-Nielsen's modification has the best 
asymptotic error rate (0(n -3 / 2 ) rather than 0(n -1 )), and hence we might expect 
that the best performance from this approximation. Instead we observe the best 
performance from other modifications with worse asymptotic error. The authors 
hope to explore this discrepancy in later work. 

One may also notice that the performance of DiCiccio and Martin's modification 
is not as good as expected. One explanation could be that, generally in small 
sample settings Bayesian method is more sensitive to the choice of the prior density 
than in the large sample situations. The importance of the choice of prior can be 
demonstrated by the poor performance of the approximations with the incorrect 
uniform priors. 
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